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ABSTRACT 

Radio observations establish the B/A magnification ratio of gravitational lens 
0957+561 at about 0.75. Yet, for more than 15 years, the optical magnfication 
ratio has been between 0.9 and 1.12. The accepted explanation is microlensing 
of the optical source. However, this explanation is mildly discordant with (i) the 
relative constancy of the optical ratio, and (ii) recent data indicating possible 
non-achromaticity in the ratio. To study these issues, we develop a statistical 
formalism for separately measuring, in a unified manner, the magnification 
ratio of the fluctuating and constant parts of the light curve. Applying the 
formalism to the published data of Kundic et al. (1997), we find that the 
magnification ratios of fluctuating parts in both the g and r colors agrees with 
the magnification ratio of the constant part in g-band, and tends to disagree 
with the r-band value. One explanation could be about 0.1 mag of consistently 
unsubtracted r light from the lensing galaxy Gl, which seems unlikely. Another 
could be that 0957+561 is approaching a caustic in the microlensing pattern. 



Subject headings: gravitational lenses - quasars: individual: 0957+561 



1. Introduction 



The gravitational lens system 0957+561 has by now been observed at optical and radio 
wavelengths for nearly twenty years (Walsh, Carswell, and Weymann 1979; Porcas et al. 
1979). Radio studies have definitively established that the B/A magnification ratio of the 
lens, measured at the core of the radio images (which lies at the location of the optical 
point source images), is close to 0.75; some recent measurements are 0.75 ± 0.02 (Garrett 
et al. 1994), 0.752 ± 0.028 (Conner et al., 1992). Note that while there is some controversy 
about the radio magnification ratio at the location of the radio jet, as opposed to the core 
(see, e.g., Garrett et al. 1994), only the core value interests us here. 
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Because the variability of the quasar in the optical is larger than in the radio, 
measurement of the B/A magnification ratio in the optical requires that the light curves 
be shifted by the correct time delay r before the ratio is taken. Thus, the earliest 
determinations of B/A were incorrect. For example, Young et al. (1980) obtained a ratio of 
0.76, comfortingly - yet erroneously - close to the radio magnification. 

However, at least from Vanderriest et al. (1989) on, who used a value for r quite 
close to definitive recent determinations (Kundic et al. 1997), it has been clear that the 
B/A magnification ratio of the optical continuum in the B and A point sources is quite 
different from 0.75, and moreover has remained at least fairly constant for the full history of 
observation. Smoothing over observing seasons, Vanderriest et al. (1989) obtained a B/A 
ratio varying between about 0.9 and 1.05 over the observing seasons early-1980 through 
early-1986 (times referenced to A component), with a single best-fit value of 0.97. It is 
debatable whether the variation around the best-fit value is actual time variation of the 
lens magnification ratio (as distinct from time variation in the quasar luminosity, n.b.) or 
observational artifact. However, it does seem quite likely that the magnification ratio varied 
by no more than about ±8% during this time. 

More recently, the value recently obtained by Kundic et al. (1995, 1997) for the 1995 
season (A component) is 1.12 ± 0.01 in g-band (with the error bar, a 95% confidence limit, 
depending somewhat on the method of reduction used). So, it is quite plausible (and not 
contradicted by other measurements in the literature) that the optical B/A remained in 
the range 0.9 to 1.12 from 1980 through 1995, and possible that the variation has been 
considerably smaller than this range. 

The discrepancy between the optical and radio magnification ratios has long been 
understood as due to microlensing (as predicted by Chang and Refsdal 1979, and Gott 
1981). The proper radius of the Einstein ring from a 0.5 M star at the lens galaxy 
redshift z = 0.36, illuminated by the quasar at redshift z = 1.41, is about 2 x 10 16 /i~ 1//2 cm. 
Since the radio emission region is much larger than this scale, it averages spatially over 
the microlensing pattern and is magnified by the macrolens ratio of 0.75. If the optical 
magnification indeed differs by ~ 30% from the macrolens value, then the optically emitting 
region must be smaller, or at most a few times larger, than the Einstein ring scale. This 
accords nicely with (e.g.) the size of an accretion disk smaller than 100 Schwarzschild radii 
around a 1O 9 M black hole (a scale of 3 x 10 16 cm). 

This Einstein ring radius is only marginally, however, in accord with the apparent 
constancy of the microlensed magnification ratio: Since the Earth, the microlensing star 
(or stars, the effect being collective), and the quasar each have (3-dimensional) peculiar 
velocities of at least 300 km/s, the Earth should move through ~ 100% microlensing 
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variations in ~ 10 yr (see Kochanek, Kolatt, and Bartelmann, 1996 for related calculations). 
So, the observed microlensing is about a factor 10 too constant, and one is invited to 
speculate on whether something other than luck is the reason. 

Another invitation to speculation is the fact that Kundic et al. obtain rather different 
magnification ratios in their r- and g-band data, with the r ratio being 1.22 ± 0.02, with, 
again, the error bar depending on the method of analysis used. By any interpretation of the 
error bars, however, the r and g results are strongly discrepant. (Again note that there is 
no assumption that the fluctuations themselves have the same amplitude in the two colors, 
but only that the magnification ratio should be the same.) Either a full 0.1 mag of r-band 
galaxy light has escaped Kundic et al.'s careful subtraction in the B image, or something 
else is going on in the lens magnification ratio. 

With these two hovering peculiarities (possible excess time-constancy, and possible 
non-achromaticity), it seems useful to try to get additional information on the magnification 
ratio. This paper therefore asks the questions: Is the optical magnification ratio the 
same for the source region that produces the fluctuations in quasar light as it is for the 
source region that produces the constant light? And, does the magnification ratio of the 
fluctuations (which we may call the "AC" magnification ratio or "ACMR") agree more 
closely with the r- or g-band magnification ratio previously measured (here called the "DC" 
magnification ratio or "DCMR")? 

The answers to these questions can help diagnose the following situations: (i) If, as is 
true in many models, the size of the emitting region is much smaller than the microlensing 
scale, then all the magnification ratios should have the same value, (ii) If there is a problem 
with r-band galaxy subtraction - or any other constant source of flux added to one lens 
component and not the other - then the ACMR should represent the "true" microlens 
magnification ratio, and we might further expect it to be close to the g-band DCMR 
(where galaxy subtraction is a much smaller effect), (iii) If the optically emitting quasar 
accretion disk has a scale comparable to the microlensing scale, and has (as seems almost 
inevitable) color gradients, then the r and g ACMRs, and r and g DCMRs, might all be 
distinct. Indeed, the two ACMRs and two DCMRs then provide four distinct windows on 
the convolution of the accretion disk source with the microlensing pattern. 

Conceptually, one measures a DCMR and an ACMR as follows: Shift one of the light 
curves (A,B) in time by r to undo the lens delay. Fit each light curve by a constant value 
plus a residual time- varying part. The ratio of the constant values is the DCMR. Now, for 
the two time-varying residuals, fit for a model that makes the B residual a constant times 
the A residual. The best-fitting constant is the ACMR. 
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This conceptual formulation, while simple, is actually not quite right. In the next 
section, we will give a statistical formulation of the problem that is more complete, and 
also more directly applicable to unevenly sampled data. In Section 3, we discuss some 
implementation details, and in Section 4 we apply the formulation to the published data of 
Kundic et al. (1995, 1997). Section 5 is discussion and conclusions. 



2. Derivation of Statistical Method 

The spirit of this derivation is very much the same as described in Rybicki and Press 
(1992, hereafter RP92), which the reader might wish to consult at this point. For the 
purposes of calculation, we assume that the underlying fluctuating light curve is generated 
by a Gaussian process f{t). This is of course only an approximation, and we will comment 
in Section 3, below, on its limits of validity; but the Gaussian assumption provides a clean 
analytic framework. Moreover, it is actually quite hard to find evidence for non-Gaussianity 
in the 0957+561 light curve (see Press and Rybicki, 1997). 

The process fit) is completely characterized by its covariance 

S(fl,f2) = </(fl)/(*2)> (1) 

so that the x 2 of a series of (for now, noiseless) observations f(ti), ffo), • • • , /(tw) is 

X 2 = FS^f (2) 

where f is the vector of / values and S is the matrix of S values relating all pairs of times 
occuring in f . The relative probability of a given sequence f occuring is 

P(f)oc(detS)-V2 e xp(-i x 2 ) (3) 

We now model the observed A- and B-component light curves as 

a(t) = a + f(t) +n a (t) 

b(t) = b + Rf(t)+n b (t) (4) 

Here fit) is a Gaussian process, as above, n a and rib are the (assumed Gaussian) noise 
processes in the A and B measurements, respectively, and R is the desired B/A magnification 
ratio, the ACMR. (The DCMR would be b /a .) 

Evidently a(t) — a and b(t) — b are Gaussian processes. If a is a vector of A 
measurements, b is a vector of B measurements, then the x 2 value of a vector combining 
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both measurements is immediately 

where the matrix M has the block-partitioned form 

M= " R2s * (6) 

with N a and N the noise covariances (here and subsequently assumed to be diagonal), and 
the notation S a b (e.g.) indicating a matrix of covariance values S relating the times of A 
observations and the times of B observations. The associated Gaussian probability, where 
we now emphasize the dependence on the unknown parameters, is 

P(a, b|ao, bo, R) oc (det M)~ 1/2 exp (-^X 2 ) (7) 



2 



Noting that 



where 



we also have 



M=( 1 » c 1 
[or [or 



S aa + N a S a b 

\ T , 



(9) 



x = {r-^-r-%) c [r-^-r-%) (10) 

and 

det M = i? 2M det C (11) 
where M is the length of the vector b (the number of B-component data points). 

We now arrive at a somewhat subtle, though important, issue: We do not want to 
assume that the fluctuating process f(t) has zero mean. Indeed, physically, f(t) might 
be everywhere positive, since a process can emit positive photons but not negative ones. 
Thus, in the context of equation (0), we want to use an unbiased x 2 that is independent 
of moving some constant flux from f(t) and into ao and bo (in correct proportion). RP92 
showed how such a "Gauss-Markov" , unbiased process is obtained by adding a term AEE T 
to C and then letting A — > oo before calculating C _1 (see RP92 for explanation of this 
notation). Henceforth, we assume that this limit is always taken. In this case, x 2 becomes 
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independent of any constant added to the vectors in the quadratic form, and equation (|T(]) 
can be written as 

where 

c = R^bo - a (13) 

The point is that, for fluctuations not known to have zero mean, the three parameters 
R, clq, and 60, are not, even in principle, separately measurable. But the two parameters R 
and Co are measurable. We refer to Co as the "contamination", since in the ideal case that B 
alone is contaminated by spurious constant light (for example, an unsubtracted component 
of the lens galaxy's light) then Rc represents the spurious flux. 

It is possible to separate, analytically, the contribution to x 2 °f c o and R, as follows: 

Let 

HA)- i-(°)-«>,o,...,o,i,i,...,if (14) 

so that equation QT2j) becomes 



.2 



X 



(y-cbqfCT^y-coq) (15) 



Straightforward matrix "completion of the squares" shows that equation (|15|) is the same as 



X 2 = (q^q) ( co - +y T Hy (16) 



where 

c-wc- 1 

H = C " q^C-iq 
The probability associated with equation (|1^) is (using equations [7| and |ll| 



(17) 



P(a, b\R, c ) oc (R 2M det C)~ 1/2 exp ( ) < ■'■<' 



Equations ( [TBI) - (|TB|) could be used directly on data, to obtain (e.g.) maximum 
likelihood estimates of cq, which appears explicitly and only in one term, and R, which 
appears implicitly only through y. However, we want to take the more Bayesian viewpoint 
that Co is a nuisance variable that ought to be integrated over, rather than maximized. (In 
practice, we find that Co is generally so well determined at fixed R that it hardly matters 
whether we are Bayesians or not.) Bayes Theorem now gives the simple result 

P(R\a, b) oc J°° dc P(a, b\R, c ) oc R~ M exp (-^y r Hy) [ACMR] (19) 
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where, in the last equality, all constant factors that do not depend on R have been dropped. 
The constant of proportionality is determined, in Bayesian manner, by requiring that the 
integral of equation flT8| ) over R be unity. Equation flTB| ) is easily evaluated on a given 
data set a, b for a range of values of R, giving either Bayesians probabilities or confidence 
intervals for R, which is the AC magnification ratio (ACMR, see Introduction). 

What about the DC magnification ratio (DCMR)? From the above discussion, we 
now see that it is not directly measurable, without further assumption. The reason (to 
reiterate) is that some part of the light that is physically part of the fluctuating part, and 
might fluctuate in the future, may happen to be constant, or nearly so, over a finite data 
set. The closest thing to what the observer means by DCMR is obtained by setting c to 
zero in equation (|H|) and following equations. That is, the observer pre-corrects the data 
sets a, b for all known constant sources of error (galaxy light subtraction, and so on), then 
fits for the best single ratio R that directly relates the A and B data sets. The Bayesian 



probability corresponding to equation ([H]) is thus evidently, by equation fll5|) 



P(R\aL, b) oc R~ M exp (--y^V) [DCMR] (20) 

We will see that this DCMR is generally much better determined statistically than is 
the ACMR, but at the price of having unknown systematics (in the pre-correction of the 
data). The ACMR is much less well-determined, but is completely independent of such 
systematics. Thus, there is a synergy in computing both magnification ratios by the unified 
formalism given here. 



3. Hints and Limitations 

We need to discuss how limiting is the original assumption of a Gaussian process. 
The Gaussian assumption enters in two ways: First, it provides an "automatic" means of 
interpolating across the time intervals between measured A and B data points (which don't 
in general line up after shifting one by r). Experience has shown (also see Press, Rybicki, 
and Hewitt 1992a,b) that this use is quite robust - the interpolation is sensible and not 
very different from any other sensible method. 

Second, the Gaussian assumption is used in associating x 2 values (and, more 
importantly A% 2 values) to probabilities. One should definitely be suspicious of this 
association in the tails of the distribution. If, however, one takes the resulting probability 
distributions as indicative of central value and uncertainty, rather than as correct in detail, 
then one is on relatively safer ground: The procedures described are essentially \ 2 parameter 
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estimations, and such estimations (in the limit of large numbers of data points, when the 
central limit theorem can apply) do not require any additional Gaussian assumptions. 

As in any x 2 parameter estimation, the method is valid only if the error model is close 
to correct. A simple test to be passed is whether the values of x 2 m the exponentials of 
equations flT9]) and (pD|) are consistent with the number of data points, i.e., whether the 
reduced x 2 is close to unity. When this is the case, we have found that additional fine 
tunings of the error model (fiddling the functional form of the covariance in the matrix S, 
or tradeoffs between adjusting the correlation model embodied in S and the noise model 
embodied in N) have little effect on the output P(R) probability functions. If, however, 
the original reduced x 2 is n °t close to unity (whether too low or too high), then any kind of 
rescaling procedure must be viewed as introducing unknown biases into the method given 
here. 

Indeed, the reason that we restrict ourselves, in the next Section, to the data of Kundic 
et al. (1995, 1997) is that for other data sets (e.g., Vanderriest et al. 1989) we have not 
been able to formulate a satisfactory and self-consistent error model (S and N matrices). 

A final technical note is to point out that the calculations implied by equations (|19|) 
and (pUD can all be done using the "fast methods" for inverting S + N matrices described by 
Rybicki and Press (1995). These fast methods restrict the correlation model to a particular 
functional form, basically requiring the correlation structure function to be a linear function 
of lag time. However, this approximation appears to be adequate for the 0957+561 data, at 
least for these purposes, and the resulting reduction in computation time is enormous, and 
would become essential for very large data sets. 



4. Application to Kundic et al. Data 

Figure 1 shows the results of applying the analysis described in the previous two 
sections to the data set published by Kundic et al. (1995, 1997). First we convert 
the data and error bars from magnitudes to fluxes a(t) and b(t). Next, we use the A 
component light curves (separately in the g and r colors) to estimate a structure function 
V(r) = ([a(t) — a(t + r)] 2 ). Next, we use this structure function, along with the errors, 
to construct the S and N matrices appropriate for the "fast" method (Rybicki and Press, 
1995). It is at this stage that the Gauss-Markov (unbiased) limit is taken (see discussion 
before equation [12], above) . 

The upper panel of the figure plots P{R) as a function of R for both the DCMR 
(equation plotted as the "narrow" distributions) and the ACMR (equation [05]; plotted 



- 9- 



as the "broad" distributions). In each case, the results for the g-band data are shown as 
solid curves, while the r-band data is shown as dotted curves. 

Our DCMR values closely reproduce the B/A flux ratios quoted by Kundic et al. in 
both the r and g bands, and our errors (widths of curves shown) are comparable to the 
latter's quoted errors. Our ACMR values - novel to this work - are seen to be compatible, 
in both g and r, with the g-band DCMR. Although there is no strict incompatibility among 
all four values (except the two DCMRs, as previously remarked) the distributions in Figure 
1 tend to support the conclusion that the r-band DCMR is "odd man out" . 

The lower panel in Figure 1 shows how many magnitudes of contamination (relative to 
the time-average flux) would need to be present in the B image to move the DCMR peak 
from its plotted location to any other location. By definition, the curves go through zero 
at the DCMR peak centers. One sees that about 0.1 mag is required in r-band to shift the 
DCMR to compatibility with g-band; however, even without shifting, the r-band DCMR is 
at least marginally compatible with the r-band ACMR. 



5. Discussion and Conclusions 

While these data, in this analysis, do not support any very definitive conclusions, we 
may make the following remarks: 

Occam's razor would seem to indicate that the r-band light curve of Kundic et al. 
has about 0.1 mag of residual, unsubtracted, constant light, as perhaps from unmodeled 
small-scale variations in the lens galaxy surface brightness. If this is the case, then all 
the data are compatible with a single magnification ratio for both colors and for both the 
fluctuating and constant pieces. This in turn suggests an accretion disk scale much smaller 
than the microlensing scale, in accord with theoretical prejudice. The utility of the ACMRs 
is that, taken together, they strongly favor the hypothesis that the g-band magnification 
ratio is the correct one, and that nothing more exotic is going wrong. We note, however 
(per E. Turner, private communication), that the galaxy Gl is something like 2 magnitudes 
fainter than component B in r band; thus the amount of unsubtracted light would need to 
be comparable to the total brightness of Gl, which seems quite unlikely. 

It is up to the observers, not us, to decide whether Occam's razor should rule in 
this case. If 0.1 mag of residual is not possibly present, then we must conclude that the 
accretion disk scale is comparable to the microlensing scale, and that the constant r-band 
part of the disk is more strongly magnified than (at least some of) the other three regions. 
It seems likely on physical grounds that the fluctuating regions should be smaller than the 
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constant regions, and that the g-band regions should be smaller than the r-band regions 
(temperature decreasing outward in the disk). For the larger (r-band and constant) region 
to have a higher magnification ratio than an enclosed smaller region, the larger region must 
extend to a place where the magnification is a superlinear function of position in the sky. 
This might indicate at least a fair chance of the B image passing through a caustic in 
the near (~ 10 year) future. This possibility, as well as the reconciliation of the relative 
constancy of the magnification ratio over the last 15 years, will be explored by Monte Carlo 
simulations in another paper (Press and Kochanek, in preparation). 

This work was supported in part by NSF grant PHY-9507695 at Harvard, and 
PHY-9513835 at the Institute for Advanced Study, whose hospitality is acknowledged. 
We thank Chris Kochanek, Ed Turner, Ramesh Narayan, and Paul Schechter for helpful 
discussions. 

REFERENCES 

Chang, K., and Refsdal, S. 1979, Nature, 282, 561. 

Conner, S.R., Lehar, J., and Burke, B.F. 1992, Ap. J. Lett., 387, L61. 

Garrett, M.A., Calder, R.J., Porcas, R.W., King, L.J., Walsh, D., and Wilkinson, P.N. 
1994, MNRAS, 270, 457. 

Gott, J.R., III 1981, Ap. J. 243, 140. 

Kochanek, C.S., Kolatt, T.S., and Bartelmann, M. 1996, Ap.J., 473, 610. 
Kundic, T., et al. 1995, Ap. J. Lett. 455, L5. 
Kundic, T., et al. 1997, Ap. J. 482, 75. 

Porcas, R.W., Booth, R.S., Browne, I.W.A., Walsh, D., and Wilkinson, P.N. 1979, Nature, 
282, 385. 

Press, W.H., and Rybicki, G.B. 1997, in Astronomical Time Series, Eds. D. Maoz, A. 
Sternberg, and E.M. Leibowitz (Dordrecht: Kluwer), 61. 

Press, W.H., Rybicki, G.B., and Hewitt, J.N. 1992a, Ap. J., 385, 404. 

Press, W.H., Rybicki, G.B., and Hewitt, J.N. 1992b, Ap. J., 385, 416. 



Rybicki, G.B., and Press, W.H. 1992, Ap. J., 398, 169. [RP92] 
Rybicki, G.B., and Press, W.H. 1995, Phys. Rev. Lett., 74, 1060. 

Vanderriest, C, Schneider, J., Herpe, G., Chevreton, M., Moles, M., and Wlerick, G. 1989, 
Astron. Astrophys., 215, I. 

Walsh, D., Carswell, R.F., and Weymann, R.J. 1979, Nature, 279, 381. 

Young, P., Gunn, J.E., Kristian, J., Oke, J.B., and Westphal, J.A. 1980, Ap. J. 241, 507. 



This preprint was prepared with the AAS IATeX macros v4.0. 



Figure Captions 
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Fig. 1. — Results of applying the formalism of this paper to the 0957+561 data of Kundic 
et al. (1997). The upper panel shows Bayesian probability distributions for the B/A 
magnification ratios in two colors (solid versus dotted curves) and by two analysis techniques. 
The narrow distributions reproduce traditional analyses that look for a single flux ratio 
relating the A and B light curves. These give statistically narrow results (as shown), but 
have unknown systematics associated with (e.g.) the background subtraction of light from 
the lensing galaxy, especially the B component in r color. The broad distributions, new 
in this work, are the magnitude ratios obtained by using only the fluctuating part of the 
measured light curves. While statistically less accurate, these are completely insensitive to 
background subtraction. The lower panel shows, for each color, the amount of constant 
light necessary to subtract from the B image to move the narrow distributions to different 
magnification ratios. See text for further interpretation of these results. 



